Students will acquire the knowledge to conduct
statistical analysis on a variety of data sets using a
wide range of modern computerized methods. The
students will learn how to recognize which tools
are needed to analyze different types of datasets,
how to apply these tools in each case, and how to
employ diagnostics to assess the quality of their
results. They will learn about statistical models, their
complexity and their relative benefits depending
on the available data. Some of the tools that the
students will come to learn well include linear simple
and multiple regression, nearest neighbors methods,
shrinkage methods (ridge, lasso), dimension
reduction methods (principal components), logistic
regression, linear discriminant analysis, tree-based
methods, model selection algorithms with criterion
or by resampling techniques and clustering.
The focus of the course will be less on theory
and more on providing the students with as much
intuition as possible and acquainting them with as
many methods as possible. The course will make
substantial use of the R statistical programming
language and its libraries.
Outcome: Not Provided